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Abstract — In this paper we introduce two variational equalities 
of directed information, which are analogous to those of mutual 
information employed in the Blahut-Arimoto Algorithm (BAA). 
Subsequently, we define nonanticipative Rate Distortion Function 
(RDF) ^ a,niD) using directed information, and establish its 
equivalence to Gorbunov-Pinsker's nonanticipatory e-entropy 
Ra"n{D). Next, we show existence of the inflmizing reproduction 
distribution, and we derive its closed form expression for station- 
ary sources. Finally, we utilize one of the variational equalities 
to provide an algorithm for the computation of i?o,n(£^). 



I. Introduction 

Directed information from a sequence of Random Variables 
(RV's) X" = {Xo,Xi,...,Xn} e Xo,n = x^Lo"^- to 
another synchronized sequence F" — {Yo,Yi, . . . ,Yn} G 
3^0, n = x"^Q3^i is a functional of two collections 
of nonanticipative or causal conditional distributions 

{Pxi\xi-\Yi-^{-\-,-), Py,\y--\X'{-\-,-) ■ i = 0,1,. ..,«}, 
unlike mutual information which is a function of Px" and 
Py"\X"- Directed information or its variants are used to 
characterize capacity of channels with memory and feedback 
Q-lISl, lossy data compression of sequential codes [4|, lossy 
data compression with feedforward information at the decoder 
0, lossy data compression of block codes, and capacity of 

'networks f6l, etc. Some of the previous references derive 
coding theorems based on various generalizations. 

' In this paper, we adopt the mathematical formulation in- 
troduced in |7|, to define directed information via relative 
entropy with respect to two consistent families of conditional 
distributions defined on abstract spaces, and we derive two 
variational equalities. These are analogues to the variational 
equalities of mutual information utilized in Blahut-Arimoto 
algorithm (BAA), although much moregeneral. 
Then we define nonanticipative RDF, Ho,n{D), with respect 
to directed information and we show its relation to Gorbunov- 
Pinsker's nonanticipatory e-entropy, Rq°^{D). We then pro- 
ceed to establish existence of the nonanticipative reproduction 
distribution, and derive its closed form expression for station- 
ary sources. Finally, we invoke one of the variational equalities 
of directed information to present an algorithm for computing 
~l^o,7i{D) similar to the BAA. Throughout, we make extensive 
use of the functional and topological properties found in |7|. 

The nonanticipative RDF, l^Q^n{D), has the following ap- 
plications. 



(1) It is the optimal performance theoretically attainable 
(OPTA) by sequential quantizers (see ID). 

(2) It is an upper bound on the classical RDF fSl (which is 
exact for IID sources and single letter distortion func- 
tions). 

(3) Its optimal reproduction distribution is a function of the 
past reproductions, and past and present source symbols, 
a necessary condition for realizing it via nonanticipa- 
tive encoder-channel-decoder to establishe joint source- 
channel matching [9J. 

The paper is structured as follows. In Section |II] we construct 
the two equivalent definitions of nonanticipative channels on 
abstract spaces, and we define directed information via the 
information divergence. In Section|III]we derive the variational 
equalities of directed information. Finally, in Section |IV] we 
discuss an application of the variational equality to nonantic- 
ipative RDF. 

II. NONANCTICIPATIVE CHANNELS ON ABSTRACT SPACES 

AND Directed Information 

In this section we define directed information using relative 
entropy, as a functional of two consistent families of condi- 
tional distributions that uniquely define {Pxilx^-^.Y'-^ ("I'j ') ■ 
i = 0,1, and {Py^|y,-i_x»(-K •) : i = 0,1,. ..,«}, 

respectively, and vice versa following [7|. Throughout the 
paper we assume Xn, 3^n, n — 0,1, . . ., are Polish spaces. 
Notation. Let N = {0, 1,2,.. .}, and N" = {0,1,2,..., n}. 
Introduce two sequence of measurable spaces {{Xn, B{Xn)) : 
n e N} and {(Xi, B(3^„)) : n £ N}, where B{Xn) and B(3^„) 
are Borel ct— algebras of subsets of Xn and 3^„, respectively. 
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Points in A"^ = XneNXn, — y-ne^yn are denoted by 
X = {xo,xi,...} e X^, y = {yo,yi,...} £ and their 
restrictions to finite coordinates by x" = {xa,xi, . . . , Xn} S 
Xa^n, y" = {yQ,yi, . . . ,yn} <E 3^o,n, for n e N. Let 
B{X''') = Q^ef,B{X,), Biy") = Q^^f,B{y^) denote the 
CT— algebras on X^\ respectively, generated by cylinder 
sets. Hence, B{Xo,n) and S(3^o,n) denote the ct— algebras of 
cylinder sets in X^ and 3^^^, respectively, with bases over 
Ai e B{Xi), Bi e B{y^),i = 0, 1, . . . , n, respectively. The set 
of stochastic kernels on y given X is denoted by Q{y; X). 
Feedback Channel. Suppose for each n e N, the distributions 

A 



{Pn{dx„ 



^2/""^) : n e N} with po{dxQ; x~^ ,y~''^) 



Po{xo) satisfy the following conditions. 



is a probability measure on 



n — l ^,n—l 



y 



i) For neN, p„(-; y"-!) 

ii) For every A„ e ;B(A'„), neN, pn{An;x' 
is a 0r=ro^(B(A:'O -measurable function of a;"-i £ 

Let C e ;B(Ao^„) be a cylinder set of the form C = {x G 
■.xo€Co,xieCi,...,XneCn}, QeBiX,), z g N". 
Define a family of measures P(-|y) on B{X^^) by 

P{C\y)= f Pf,{dxo)...f p„(dx„;""i,y"-i) (1) 

"'Co "'C„ 



— -Po,Ti(Co,n|y" ), e'en — XiLo^i 



(2) 



The notation ^o,Ti(-|y"~^) denotes the restriction of the 
measure P(-|y) on cylinder sets C G B{Xo^n), for n G N. 
Thus, if conditions i) and ii) hold then for each y G y^, 
the right hand side (RHS) of ^ defines a consistent fam- 
ily of finite-dimensional distribution on {X^\ B{X^'^)), and 
hence there exists a unique measure on {X^\ B{X^)), from 
which p„((ix„; a;"~^, is obtained. This is the usual 
definition of a feedback channel, as a family of functions 
Pn{dXn', x"^^ ,y"^^) satisfying conditions i) and ii). 
An alternative, equivalent definition of a feedback channel 
is established as follows. Consider a family of measures 
P(-|y) on {X^ ,B{X^)) satisfying the following consistency 
condition. 

CI: UEe B{Xa^n), then P(£;|y) is B(3^o,n-i)-measurable 
function of y G y'*'. 

The set of such measures is denoted by Q'-^^{X^^; 3^^). 
For Polish spaces, it can be shown that any family of measures 
P( |y) satisfying CI one can construct a collection of condi- 
tional distributions {pn{dxn',x^~^ ,y^~^) : n G N} satisfying 
conditions i) and ii) which are connected with P(-|y) via 
relation 

Feedforward Channel. The previous methodology can be re- 
peated for the collection of distributions {g„((ij/„; y"~^, a;") : 
n G N} which satisfy similar conditions to i) and ii). Similarly 
as before, define a family of measures Q(-|x) on B{y^'^) by 

Q(I?|x)=/ qo{dyo;xo)... f (z„(dy„; x") (3) 
Jdo Jd„ 

= ^0,„Po,n|x"), i^0,neB(3^0,„). (4) 

Then, (|4|l is a unique measure on (3^^^, -6(3^'^)) from which 

{(7„(c?j/„; x") : n G N} is obtained. 

An equivalent definition of a feedforward channel is a family 

of measures Q(Z?|x) satisfying the following consistency 

condition. 

C2: If f G B{yo,n), then Q(F|x) is S(A'o,„) -measurable 
function of x G A"^. 

The set of such measures is denoted by Q'^'^{y^; X^^). 
Then, for any family of measures Q(-|x) on (3^^,S(3^^)) 
satisfying C2 one can construct a collection of conditional 



distributions {(j„((iy„;y" ^,x") : r, 
connected with Q( |x) via relation 



G N} which are 



A. Directed Information Functional 

Next, we define directed information /(X" Y^) us- 
ing P(-|y) and Q(-|x). Given P(-|-) G Q'^i(A:''^; 3^^) and 
Q(-l-) e 2^2(3^''; A-N) define: 

PI: The joint distribution on X^'^ x 3^^ defined uniquely by 

(Kn®'^o,n)(xr=o^.x-B.O,A, gB(A',), B, ^ B{y,) 

P2: The marginal distributions on X^ defined uniquely for 

Ai G B{Xi), i = 0, 1, . . . , n, by 

P3: The marginal distributions on y^'* defined uniquely for 

B, (,B{y,), i = 0,l,...,n, by 

i'o,„(xr=oi?.) - CPo,n ^o.„)(xr=o('^» X B,)). 

P4: The measure ito,™ : ^(^o,n) ^(yo.n) ^ [0, 1] defined 
uniquely for G B{Xi), Bi G B{yi), i = 0, 1, . . . , by 

I?0,n(xr=o(^»xB,)) = (Po,n I'Cn) ( X ^=0 (^^ X B,) ) . 

By invoking the definition of directed information [lO] and 
measures P1-P4, it can be shown by repeated application of 
chain rule of relative entropy IITTI that 

/(X" ^ r") = B(^^o,„ ^o,J|r?o.n) (5) 

= j log ( ]^^ Jdyn) ) ® 0oJ{dx\dy^) (6) 

= Ixn^y.(Kn,^0,„)- (7) 

The equivalence between Q and (|6]) follows from the 
Radon-Nikodym Derivative (RND) theorem. The notation 

■) indicates the functional dependence of /(X" 
y") on {K,n,^o,J- 

III. Variational Equalities of Directed 
Information 

In this section we derive two variational equalities asso- 
ciated with /(X" —5- y"). First, we recall one of the 
variational equalities of mutual information I{X'^;Y") = 
lx^-Y"{Px"- , Py^\x^) which can be expressed as maximiza- 
tion of relative entropy functionals as follows lfT2l . 
Max: Given a channel Pyre|X"(d2/"|a;"), a source Px"((ix"), 
and any conditional distribution Px"|Y"('^2;"|?/") then 



sup 



log 



IIx";y"(Px",PF"|X") 

xPy„|x„(dy"|x")0Px"(dx") 



P 



Px^idx^) 



(8) 



and the supremum is achieved at Px^^\Y"{dx"'\y''^) 

PY^^x^idv"\x")(g)Px^idx"-) 



Let P(-l-) G Q^H-^Y^'') and Q{-\-] 
let Po,n(da;",dy") = Po,n(dx"|2/"-i^ 



G 



)C2(3;N.;t'N), and 



Let S(-|x) be any measure on (3''*', ;B(3'^)) satisfying the 
consistency condition 

C3: If P G i3(3^o,n), then S(P|x) is a 



K(A'o,„_-i)— measurable. 

Denote this family of measures by S(-|x) e Q<^3(J^'^; A'f*). 
By Section [III for any family of measures S(-|x) on 
satisfying consistency condition C3, there exists 
a collection {s„(-; •, •) £ QCXiiJ^o,™-! x ^o,«-i) : n £ N} 
connected to S( |x) by 

S(D|x)=/ s„{dyo)...f s„(dy„;2/"-\a;"-i) 

JDn J D„ 



Unlike ~^ o,n(-|a;") which is conditioned on x" e Xo,n, the 
measure 5o.n(-|a;"^^) is conditioned on .t"^^ £ Ao_„_i. 
Let R(-|y) be any family of measures on {X^*,B{X^)) 
satisfying the consistency condition 

C4: IfE B{Xn.n), then Ii{E\y) is a B(3^o,n) -measurable. 
Denote this family of measures by R(-|y) G Q'^'^{X^^-y^^). 
Similarly as before, for any family of measures R(-|y) on 
{X^'\ B{X^)) satisfying consistency condition C4, there exists 
a collection {r„(-;-,-) £ Q(<Yn;^o,n-i x ^Cn) : n G N} 
connected to R(-|y) by 

R(G|y) = / ro(dxo;yo)--- / r„(da;„; y") 

= iio,n(Go.„|2/"), Go,„ = X^^oG, £ B{Xa,n)- (10) 



Unlike ^o,n('l2/" ^) which is conditioned on y" ^ £ ^o.n, 
the measure Ro.n{-\y") is conditioned on G J^o.n- 
Define another joint distribution on {^X^^ x y^, Q-am^i-^n) 

by (Kn® i?o,n)(rfa;",d2/"). 
The next theorem gives the two variational equalities. 

Theorem 1. (Variational Equalities) 

Part A. For any arbitrary measure pQ.n £ A^i(3^o.n) 



inf ] 

po.„eAii(>'o,„) 



po.neMiiyo,,^) J V j/o,n(d2/") 
X (Po,««) c|o.«)(c«a;",d2/") 



(11) 



(12) 



ant/ f/ie infimum in All]) is achieved at VQ^{dy^) = 

IxoS'^0,n ® ■^o,„)('^a:",dy") = Uo^nidy"). 

Part B. For any S(-|x) £ QC''(3^n. ;^N) j^(.|y) ^ 

log ^ jd{Po,n® QQ,n) 



sup 



anc/ the supremum is achieved when the RND satisfies 

A tn A rf(^0,n®^o.n) ,,,, 

Ao,„(a; , y ) = ^= = 1 - a.s., n £ N. (14) 

d{S^ji ® Ro,n) 

Equivalently, for i = 0, 1, . . . , n, 

\/i A _Pi(fia;,;x'-\j/'-i) (g)g,(d?/i;y*-\a:*) 
Aj(a; ,?/ ) = — r-; --; — ■, s _ 7-; --; — rr = 1 - a.s. 



Si{dyi]y^ i)®ri(dxi;x* 1,2/*) 



Proof: Part A. By definition: 



(15) 



log ( ^-'"^yif ® ^^,J{dx^\dy-) (16) 



+ D(l'0,n||l^O,n) 



(17) 
(18) 



Moreover, equality holds in (fT¥t when fg„ = i/Q.n- 
Part B. Consider the difference between /(X" -> F") = 
D(^o,n ® "^o.rJII^o,™) given by O and the RHS of O 
(without the supremum). Then 

/d(Vo,„ (8)^0,n)\ X 

log — ^ M(^'0,n«) Qo.n) 

^ d(Po.n<»J^O,«) ^ 

/ log — ^ ^ - M(^'0,n ® Qo.n) 

where the inequality follows from logx > 1 — -, x > 0, 
which holds with equality if and only if .t = 1. Fur- 
thermore, the inequality becomes equality when the RND 

Ao,„(x",y") = £°-^3°- = 1, Vo,« ito,„ - a.s. in 

Since {Po.n ® Qo,«)('^0,n X J^o.n) = (S-cn 8) 

^o,Ti)('^o.n X 3^0. „) = 1, this condition is equivalent to 
^o,n <8> Qo.n = ^o,n ^o,ri- By Conditioning (O on 
i3(A'o^„_i) S(>'o,„-i) one obtains O- ■ 
Discussion. Next, we discuss the relation between the vari- 
ational equality of directed information given by (fT?t and 
the variational equality of mutual information given by dHJ. 
Clearly, (O is also equivalent to (since Py" is fixed) 

Px^\Y^^{dx"\y")^PYAdy"y 



sup 



log 



(13) 



Px"(rfa;") X Pyr^idy"^) 

X Py^ix^dy^lx") (E> PxAdx") (19) 

since the RND in ( fT9] l is another version of the one in 
dHJ. Thus, ( fT3T l is the analogue of ( fT9] ), in which the di- 
rected inforrrmtion function is utilized together with the de- 
composition S o,n i^o.ri of the joint distribution. Suppose 



qi{-;y'-\x') < s,{-;y'-\ x'-^),Vi. Then from 



'/a-, qt{dyi;y''-'^,x^) (g) Pi{dxf,x'--^,y''-'^) ' 



The previous expression is the analogue of the maximizing 
distribution Px"|y" in ®. Finally, we note that the optimiza- 
tion in ( fT3] ) can be done by keeping 5*0 n fixed, and generated 
by P(-l-) e QC1(A'N;3;N) and Q(-|.) e QC2(3;N. ;i'N)^ 

IV. Application to Nonanticipative RDF 

For the rest of the paper we focus on developing an 
algorithm to compute the nonanticipative RDF. 
First, we recall Gorbunov-Pinsker's definition of nonanticipa- 
tory e-entropy lfT3i . Introduce the measurable distortion func- 
tion by do,n(a;",y") : -^o^n x J^o,™ ^ [0,oo), do,n(a;",y") = 

J27=oPo,t{x\y')^ and let do,„(x",?/") = Y^'Lo Pi^^^V^) for 
single letter Introduce the fidelity set by 

QoAD) = {PY'^\xAdy"\x") : 

^ J do.n{x'\ynPY^\x4dy''\x") <g PxAdxn < d}. 

Gorbunov and Pinsker restricted the set Qo,n{^) ^'^ those 
reproduction distributions which satisfy the Markov chain 
(MC) o X" o y" ^ Fy„|xoo(dy"|a;°°) = 

Py,i|X"(d2/"|a;") — a.s.,\/n > 0. Then they introduced the 
nonanticipatory e-entropy defined by 

RTn{D)^ inf /(X";r"). (20) 

i=0,l,...,Ti-l 

Thus, the difference between the classical RDF |14| and 
nonanticipatory e-entropy (l20l i is the presence of the MC 
which implies that for each i, Yi is a function of the past and 
present source symbols {Xq, Xi, . . . , Xi}, and independent of 
the future source symbols {X^+i, . . . ,X"}. It can be shown 
that the MC o X'' o Y\ i = 0, - 1, 

is equivalent to PYi\xi{dy^\x^) = l^y^X'idy^l^^) ~ i-S-, 
i = 0, 1, . . . , n - 1. UtiUzing this MC, then 

where the notation Ix^^y^{Px",~I^Y'^\X") is used to point 
out the functional dependence on {Px",~1^y"\X"}- Then the 



definition of nonanticipatory e-entropy ( |20] | is equivalent to 

iD) = ^M Ix^^yAPx'^.^Y'^ix'^) (21) 



A 



^ J doAx''.yn^Y-\x4dy''\xn Px^dx") < d]. 

We call ( I2TI ) the nonanticipative RDF. Next, we introduce some 
assumptions and we establish existence of the infimum in (I2TI) . 

Assumption 1. (Main asumptions) 

(1) 3^0. n <2 compact Polish space, Xo^n is <^ Polish 
space; 

(2) for all h{-)eBCiyo.nl £ <Yo.„ x 

yo,n-i ^ Jy^hiy)PYiY'^-i,xAdy\y"~\x'') e R 

is continuous jointly in (a;", y^~^) G <Yo,n x 3^o,ri-i/ 

(3) (io,n(a;", •) /s continuous on 3^o,„; 

(4) There exist (a;",y") € '^o.n x 3^0, n smc/z f/za? 

Note that since J^o.n is assumed to be a compact Polish 
space, then by |11| probability measures on yo,n are weakly 
compact. Moreover, the following result can be obtained, 
which we will use to show existence of the infimum in (I2TI) . 

Lemma 1. 4751/ Suppose Assumption\l\ (1), (2) hold. Then 
(1) The set Q*^^(3^o,n; -^o.n) is weakly compact. 



(2) 



'^{Px^,!^ 



Y"\X'^ 



is lower semicontinuous 



on Q^^{yo,„; for a fixed Mi{XqA- 
(3) Under the additional As sumption\l\ (3), (4) the set 
^o,ri(-D) is a closed subset of Q*^^(3^o,n; -^o.n)- 

The next theorem establishes existence of the minimizing 
reproduction distribution for (l2ll . 

Theorem 2. (Existence 4751/ ) Suppose Assumption Q] hold. 
Then the infimum in (|27} is achieved and l^Q^n{D) is finite. 

By invoking Gorbunov and Pinsker lfT3] Theorems 3, 
4], for a stationary source and single letter distortion, 
lim„^oo 7^0, n (7^) exists, it is finite, and the optimal reproduc- 
tion distribution is realizable by stationary source-reproduction 
pairs {{Xi,Yi) : i = 0,1,..., n}. Hence, the {n + 1)- 
fold convolution conditional distribution 
'^?=oPYi\Y^-^.X' {dyi\y^~'^ ,x'^) — a.s., is a convolution of 
stationary conditional distributions. Next, we give the solution 
of ^ o,n(7))- By utilizing the Lagrange duality theorem lfT6l 
we obtain the unconstrained problem. 



(7^)= sup ^inf |lx.^y.(Px",7^ 

s<0 ?^„i^„ 



jC2 



si£d„JpY^\x^)-D)}. 



(22) 



Note that Py^lX" G Q (3^o,n;'^o,ri) are probability mea- 
sures therefore, one should introduce another set of La- 
grange multipliers to obtain an unconstrained problem free 
of such a constraint. For the rest of the paper, we consider 



A 



do,„(a;",y") = E^^o T^y"), where T^a;" is the shift 
operator on (similarly for T'^y^). Then by computing the 
Gateaux differential of (|22] | we obtain the following (see 1 15 1). 
(1) The infimum in (l22l l is attained at i^y„|^„ G ^o.n(£') 
given by 



(23) 



where s < and e Q(3^^; 3^o,»-i). 

(2) The nonanticipative RDF is given by 



to AD) = sDin + 1) - E / ( / e^''^^'^"'^"^"^ 
If to AD) > then s < and 

Utilizing (l)-(3) and the variational equality of Theorem [T] 
Part A., we have the following theorem. 

Theorem 3. (Double Minimization) 
a) For s < 0, the nonanticipative RDF, ~llo.n{D), can be 
expressed as a double minimization as follows: 

to n{D) = sD(n + 1) + mill min 

Po.,.6A1i(>'o,„) ^„_„e-go,„(D) 



{/'-(- 



(Mo,n QoAidx'^jdy'^' 



i^o,«(rfy") 

s I do,„(x",y")(Aio.n«>^o,n)(rfx",dy")}. (24) 

b) For fixed C^o n, the 2"'^-term of the RHS ofl \24i is minimized 

by 

^ln{dy'^)= / (Mo,n®'^o.n)(c^a:",dy"). 

c) For fixed Do n, the 2'^'^-term of the RHS of A24}l is minimized 
by 



^jpiT'.-,T^ynp^^^_^(dyf,y^-^) 



Proof: Parts a) and b) follows from Theorem [T] Part A. 
while c) follows from calculus of variations and necessary 
conditions of optimal solution. ■ 
We now establish the following generalization of BAA for the 
nonanticipative RDF. 

Theorem 4. Let the parameter s < be given. Let i>Q „ be 
any probability measure which is positive. Let be given 
in terms of Pq „ by 



where A, = (,sp{T^^^ ,T^yn _ Then 

„)) — > Ds, as r ^ oo. 
))^t o.n{Ds), as r ^ oo, 

where {Ds,to.n{Ds)) is a point on the curve toA^) 
parametrized by s. 

Proof: The derivation utilizes Theorem |3] and lfT2l . ■ 

V. Conclusion 

In this paper we derive two variational equalities for directed 
information defined over a consistent families of conditional 
distributions on abstract spaces. Then we show existence of 
the reproduction distribution which achieves the infimum of 
the nonanticipative RDF, and we use the variational equality 
to find a BAA for nonanticipative RDF. In the final paper we 
will apply the BAA for specific examples. 
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